Picture for Di Wen

Di Wen

Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models

Add code
May 19, 2026
Viaarxiv icon

EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

Add code
May 18, 2026
Viaarxiv icon

IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory

Add code
Apr 22, 2026
Viaarxiv icon

Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation

Add code
Apr 12, 2026
Viaarxiv icon

IMPACT: A Dataset for Multi-Granularity Human Procedural Action Understanding in Industrial Assembly

Add code
Apr 12, 2026
Viaarxiv icon

Towards Multi-Source Domain Generalization for Sleep Staging with Noisy Labels

Add code
Apr 11, 2026
Viaarxiv icon

ProOOD: Prototype-Guided Out-of-Distribution 3D Occupancy Prediction

Add code
Apr 01, 2026
Viaarxiv icon

Not an Obstacle for Dog, but a Hazard for Human: A Co-Ego Navigation System for Guide Dog Robots

Add code
Mar 20, 2026
Viaarxiv icon

InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing

Add code
Mar 13, 2026
Viaarxiv icon

$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs

Add code
Mar 10, 2026
Viaarxiv icon